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Abstract 

In this work, two different methods for extracting the mass of a new quark from the (pseudo) 
data are compared: the classical cut-based method and the matrix element method. As a concrete 
example a fourth family up type quark is searched in p-p collisions of 7 TeV center of mass en- 
ergy. We have shown that even with very small number of events, Matrix Element Method gives 
better estimations for the mass value and its error, especially for event samples in which Signal to 
Background ratio is greater than 0.2. 



I. INTRODUCTION: 

In searching for new phenomena at the particle physics experiments, it is very important 
to extract the values of the unknown parameters with maximal statistical significance from 
small data samples. At this point, Matrix Element Method (MEM) provides a very powerful 
tool which gave the most precise value for top quark mass at Tevatron experiments D0 and 
CDF [1,2,3,4]. After the method became more popular, it has also been applied to other 
analysis such as electroweak single top quark production [5], estimation of the longitudinal 
W boson helicity fraction in top quark decays [6] and searches for the Higgs boson [7]. It can 
be applied to any mass analysis which includes exclusive decay channels at hadron colliders 
for BSM researches. In this paper, a brief description of this method has been presented 
followed by a comparison of the results of heavy quark search analysis using a traditional 
cut-based method and to those from the matrix element method. 

A. Matrix Element Method: 

The name Matrix Element Method comes from the fact that probability function which is 
used in this method is driven by physical matrix element. Matrix Element Method uses both 
theoretical and experimental information to extract the values of any unknown parameters 
from the experimental data. Therefore, the essential point of the MEM is that, it maximally 
uses the information contained in the physics of the problem, without trying to extract 
it from the distributions as in the case of cut and count method. In this method, each 
experimental measured quantity is associated to a Bayesian probability function P(x|q;) 
which gives the probability to observe this event in a certain theoretical frame a. The 
probability weight which is based on square matrix element [8,9] can be written in the 
following form [10,11]: 



where a; is a set of detector-level kinematic quantities, y is the parton-level 4-vectors, a is 
the parton level cross-section (1/cr factor ensures the normalization of probability), M is the 
matrix element describing the production and decay process, fi(w\) and 72(^2) are parton 
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distribution functions, d<p(y) is phase-space element and W(x, y) is the transfer function 
or resolution function which describes the probability density to reconstruct an assumed 
partonic final state y as a measurement x in the detector. 

The probability is derived by integrating over all possible parton states, and each config- 
uration is weighted according to its probability to produce the observed measurement. The 
weights are then combined together into a likelihood to determine the most probable value 
of the parameter of interest (top quark mass, W helicity, etc). 

The likelihood function for N measured events can be written as: 

JV 

L(a) = e -NjP(x, a )dx JJ a ) (L2 ) 

where a is any parameter that we want to estimate and P(xi,a) is measured probability 
density. The derivation of likelihood can be found in [12]. The best value of a is obtained 
through maximization of the likelihood or more practically minimize -In L(a) with respect 
to a. 

1. Transfer Functions: 

The determination of transfer functions (TF) is the most important part of matrix el- 
ement method. As mentioned before, transfer functions maps parton level quantities with 
detector level measured quantities or vice versa. The energy resolution of leptons and jets 
is parametrized with transfer functions W(AE = E parton — Ej et ) and they gives the proba- 
bility for a measurement Ej et in the detector, if the true object energy is E parton . TFs can 
be decomposed into a product of functions for each external or internal particle, and each 
part can be handled separately. Although there are different type of TFs can be found in 
various analysis, the most used one for jets is Cannelli's double gaussian formulation [13], 
one gaussian for the symmetric peak while the other accommodates the asymmetric tails of 
the AE distribution. In this formulation jet transfer function is expressed to be a function 
only of the relative energy difference between the parton and the jet : 
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where the energy depend of these a, parameters can be written in following form [14]: 



These parameters can be determined by minimizing a likelihood formed by measuring 
parton energy and matched jet energy in a Monte Carlo sample under consideration and 
they must be determined in different pseudorapidity regions of the calorimeter to account 
for resolution differences in the detector. There is also a library available, called KLFitter 
[15], which gives these parameters for different particle types and different eta regions for 
ATLAS detector. 

Theoretically lepton energies and angles can be parametrized as a gaussian but in practice 
they assumed almost well-measured by detector apparatus so the TFs for lepton energies 
and all the particle angles can be parametrized by delta functions. This parametrization is 
also time consuming for the computation of weights. 

II. ANALYSIS: 

In this work, comparison of matrix element method and cut-based method for mass 
reconstruction analysis of fourth family up type quark [16], 114, at 7 TeV center of mass 
energy using event samples which include different Signal to Background (S/B) ratios has 
been presented. 

This analysis is based on Monte Carlo events generated with MadGraph/MadEvent [17] 
and processed through Pythia [18] for the parton-shower and hadronization. Finally, detector 
response is simulated by PGS [19]. In this study, the mixing between fourth generation and 
the first SM family is assumed to be 100 percent. Therefore, the dominant decay channel 
«4 — > W+ d is considered. As signal, the pair production of up type fourth family quark, 
«4, at a proton-proton collider at a center of mass energy of 7 TeV is considered. The full 
process for signal events is: 




(1.4) 



pp — > U4U4 — > W W + jj 



(III) 
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where j is a jet originating from a d quark or d quark and one of W decays leptonically 
whereas the other decays hadronically. For simplicity, electronic decay mode of the W is 
considered. Therefore, the signal is searched in the J^j+le+MET final state. As the dominant 
background sample, ti events in which the top quark pairs decay semi-leptonically has been 
considered. These backgrounds are also produced with MG-ME/Pythia-PGS chain with 
CTEQ6L1 [20] as the PDF set. 

The Monte Carlo events have been produced for three different input mass values of u± 
quark: 400, 500 and 600 GeV. These events were required to contain the right number of 
jets and leptons in the final state (i.e. 4 jets and 1 electron for this study). 

A. Cut-Based Analysis: 

In the cut-based analysis, leptonically decaying W bosons were reconstructed from the 
4-momentum of the lepton and the missing transverse momentum. Assuming a massless 
neutrino and on-shell W mass, the z component of the neutrino, and its energy are obtained 
by solving these two equations with two unknowns. If the equations can be solved, the 
solution providing the smallest \P Z \ is selected. The rational behind this selection is to use 
the smallest estimated value, thus to reduce the error margin. If the equation set cannot 
be solved (A < 0) then, the neutrino four momentum is formed using the collinearity 
approximation, i.e. by assuming the same r] for the neutral and charged leptons and again 
a massless neutrino. Hadronically decaying W bosons were reconstructed using the 4- 
momentum of two soft jets in each event. The two relevant jets are selected by considering 
the pairing of all jets, and by selecting the pair which would minimize a \ 2 defined as: 



v 2 = {Mjj ~ M w ) 2 (Mjjj - M jul f 

x - 2 ■+■ 2 y LL - z ) 

where Mjj is the reconstructed invariant mass from two jets, Mjjj is the reconstructed 
invariant mass from three jets, Mj V \ is reconstruced invariant mass from lepton, MET and 
jet, aw is decay width of W, <jq is decay width of new heavy quark. The W-jet association 
ambiguity is resolved by selecting the combination which yields in the smallest difference 
between the masses of the two reconstructed quarks in the same event. The u<± invariant 



5 



mass is obtained by taking the average of the hadronically and leptonically decaying 114 
quarks. In the generation step, standard kinematic selection criteria are applied as follows: 



P T , e > WGeV, 
P T j > 20GeV, 
\Ve\ < 2.5, 

AR{e,j)>0A, 
AR(j,j)>0A, 

\Vj\ < 5- (H.3) 



where Pr,e is transverse momentum of electrons, P^j is transverse momentum of jets and 
\rj e \, \rjj\, are the rapidity for electrons and jets and, AR (e,j) is the angular distance between 
electrons and jets, AR (j, j) is angular distance between jets, with AR = a/ Ar] 2 + Acf) 2 . 

At the reconstruction step, 114 mass was fixed to 500 GeV and n 4 invariant mass was 
extracted from a sample containing only 15 signal events. The reconstructed mass histogram 
for this case shown in Fig. III. II 
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Figure II. 1: Invariant mass histogram of 114 with the cut-based method for an input test mass of 
500 GeV. The result is extracted from a pure signal sample which contain only 15 events. 

The same procedure has been applied to other samples containing different numbers of 
signal and background events. In short, the S/B ratio was scanned from a purely signal 
sample down to a mostly background sample keeping the total number of events same, 
namely, 15. The cases which were scanned are: 13 signal (S) + 2 background (B), 11 S + 
4 B, 9 S+ 6B, 7S + 8B, 5S +10 B, 3 S + 12 B. Invariant mass histograms obtained for 
these cases are shown in Fig. 111.21 
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Figure II. 2: Invariant mass histograms obtained from cut-based analysis for various event samples 
with decreasing S/B ratio and an equal signal mass of 500 GeV. 



This procedure was also tested with other U4 masses, namely 400 and 600 GeV. The re- 
constructed invariant mass histograms for these input masses are shown in III.3I and III.4I 
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Figure II. 3: The same as Fig. III. 21 but for m U4 = 400 GeV. 
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Figure II. 4: The same as Fig. III. 21 but for m U4 — 600 GeV. 



The input masses and the reconstructed masses using cut-based technique for the final 
states with different S/B ratios are shown in the Tabled. 

Table I: Invariant mass values extracted from the cut-based analysis for various samples which have 
different S/B ratios and different input values. 
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Event sample 


Output Ui masses for 


input mass= 400 GeV 


input mass= 500 GeV 


input mass= 600 GeV 


15 signal 


387 3 ± 105 


450 7 ± 113 5 


558 ± 107 

.-J .-J V./ — 1 — -1- V J I 


13 signal + 2 backg. 


384 9 ± 125 9 


446 3 ± 125 4 


527 1 ± 140 8 


11 signal + 4 backg. 


355.7 ± 136.1 


422.3 ± 148.3 


480.9 ± 177.8 


9 signal + 6 backg. 


339.6 ± 147.3 


387.4 ± 166.6 


432.8 ± 200 


7 signal + 8 backg. 


309 ± 150.9 


349.3 ± 179.7 


382.7 ± 213.2 


5 signal + 10 backg. 


283.6 ± 155.1 


326.1 ± 188.5 


351.4 ± 222.5 


3 signal + 12 backg. 


247.3 ± 122.6 


289.7 ± 187.8 


303.1 ± 216.3 



One can see from Table [J that even in the case of pure signal sample, the deviation 
from input values is large and the most correct result is obtained for 400 GeV input mass. 
The second interesting point is that, the samples including mostly background events also 
give new quark mass estimations around input mass instead of top mass, therefore this 
approach is relatively useless for discriminating signal and background events especially with 
low statistics. 

B. Matrix Element Method Analysis: 

This method relies on the correct calculation of the weights in Eq. 11.11 To ensure their 
correct computation, MadWeight, which was developed by the MadGraph Team [17], has 
been used. MadWeight is a phase space generator which takes lhco files [21] and processes 
information with data cards and returns likelihood values for the parameter of interest. 

In this part, event files for 15 signal, 13 signal+2 background, 11 signal+4 background 
and so forth are used in MadWeight to estimate the signal mass for three input u± masses: 
400, 500 and 600 GeV. A sample of N = 15 events are processed through MadWeight for the 
evaluation of the weights. The mass of the quark is extracted through the minimization 
of —ln(L(m U4 )) with respect to the m Ui . 

In this note, the default transfer function in MadWeight has been used. In this setup, the 
jet energy is parametrized by a double gaussian, and all other quantities such as the angles 
of visible particles and the energy of leptons are assumed to be well measured. This means 
that the corresponding transfer functions for lepton energies and angles are given by delta 
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functions. 

As in the cut-base approach, the analysis started from event samples which were generated 
with an input mass of 500 GeV. The likelihood curves obtained for this mass with various 
signal and background samples are shown in Fig. III. 51 
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Figure II. 5: Plots for likelihoods for samples of 15 events containing different ratios of S/B generated 
with input mass of 114 500 GeV. The mass value of U4 has been extracted from the parabolic curve 
fitting of the points around the minima. 

Estimated masses are shown in the legend box of each graph, except the last one, i.e. 
3S plot in which one finds 167.77 GeV. These estimations are extracted from a parabolic 
curve fit to (— InL, Mass) points obtained from MadWeight. Error values include both 
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standard deviation of likelihoods, evaluated via increasing the minimum likelihood value 
by 1/2, which corresponds to a la deviation and also the errors originating from parabola 
fitting. If a wide mass range is scanned, then two likelihood minima are obtained (top, tt 4 ) 
except the 3S12B case, where only one value corresponding to the top quark mass is found. 

The same procedure has been applied for event samples produced with input masses of 
400 and 600 GeV. The resulting curves are shown in Figs. III. 61 and III. 7^ respectively. 
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Figure II. 6: The same as Fig. III. 5l but for m Ui = 400 GeV. 
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Figure II. 7: The same as Fig. III. 5l but for m« 4 = 600 GeV. 



The input masses and the reconstructed masses using matrix element technique from the 
final state with different S/B ratios are shown in the Table HTl 

Table II: Matrix Element analysis results obtained for various U4 input masses and event samples 
which include various S/B ratios. 
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Event sample 


Output Ui masses for 


input mass= 400 GeV 


input mass= 500 GeV 


input mass= 600 GeV 


15 signal 


393 68± 10 50 

'7 '.7 '. J * V7 v7 — 1 — -1- V 7 • '..7 V J 


503.41+8.14 


621.05+10.02 


13 signal + 2 backg. 


386 35± 11 30 


498 91+10 04 

_l_ T_y ■ t.7 -1 1 _1_ V / > V 7 J- 


622 97+12 46 

V7 —i > -.7 1 1 L • _i_ \_r 


11 signal + 4 backg. 


383.25 ±11.20 


499.72+11.65 


617.15+12.84 


9 signal + 6 backg. 


377.06+ 15.80 


495.54+15.29 


610.34+13.80 


7 signal + 8 backg. 


369.72+ 14.33 


487.43+17.31 


608.88+14.46 


5 signal + 10 backg. 


351.86+ 13.92 


471.50+24.19 


558.50+18.07 


3 signal + 12 backg. 


166.57+ 8.01 


167.77+8.32 


168.30+7.45 



By comparing Table HI fcHT| it can be clearly seen that, MEM gives much smaller deviations 
from the input values for masses and errors compared to the cut-based analysis. In addition, 
as number of background events increased, the resulting value approaches the top quark mass 
again oppositely to the cut-based results. 

Furthermore, when the relative deviation from the true value is ((True Value - Recon- 
structed Value)/ True Value) plotted against the S/B ratio, one notices that, the deviations 
obtained from matrix element method are much smaller than the ones extracted from the 
cut-based analysis technique, especially for S/B values greater than 0.2. 




Figure II. 8: Comparison of S/B vs corresponding errors for both Cut-Based and Matrix Element 
Method results for different masses. 
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As shown in Fig. III.8t matrix element method becomes less accurate in the region of S /B 
< 0.2. 

III. CONCLUSION: 

This study shows that for data samples containing events with various signal to back- 
ground ratios, the matrix element method gives essentially better values for the parameter of 
interest (mass of fourth family up type quark, in this analysis). As a second result, MEM is, 
also a powerful tool to discriminate signal and background events even with small statistical 
data if S/B > 0.2. 
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